6
America) with this computer program, which only writes letters as optimally as possible
among each other (hence sequence comparison or alignment). The decisive prerequisite
for this is that one knows and understands the results correctly in their biological mean
ing - and this is precisely the work of the bioinformatician.
Understanding Data
1.2
2008
2008
What would you actually have to pay special attention to if, for example, you now per
form such sequence comparisons yourself? It is important to know that the BLAST search
is not completely accurate (heuristic), but it delivers faster results than a 1:1 comparison
over the entire sequence length against the database. Therefore, such hits are only credible
if the probability of getting such a hit by chance is low enough. As a first rule of thumb you
can remember: The E-Value (i.e. the expected value of a random hit) should be less than
1 in one million. This is then already a very convincing value. In borderline cases (random
expectation value at 1 in 1000), you can also take the hit sequence and see if you can find
the initial sequence again (called “reverse search” in technical jargon). If we keep in mind
that this is a local search, then we also understand why we should search the whole hit
length (given in the example, sequence similarity over the whole sequence length). But
there are also BLAST results where only one subsequence in the protein has high similar
ity and the rest instead shows no similarity. In this case, the BLAST search turned up only
one protein domain, the one with the highest similarity in the whole database. To deter
mine the remaining parts of the sequence in terms of function as well, you then need to use
only those domains that do not yet have database hits again, without the first sequence part
1 Sequence Analysis: Deciphering the Language of Life